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The community structure of complex networks reveals both their organization and hidden relationships 
among their constituents. Most community detection methods currently available are not deterministic, 
and their results typically depend on the specific random seeds, initial conditions and tie-break rules 
adopted for their execution. Consensus clustering is used in data analysis to generate stable results out of a 
set of partitions delivered by stochastic methods. Here we show that consensus clustering can be combined 
with any existing method in a self-consistent way, enhancing considerably both the stability and the accuracy 
of the resulting partitions. This framework is also particularly suitable to monitor the evolution of 
community structure in temporal networks. An application of consensus clustering to a large citation 
network of physics papers demonstrates its capability to keep track of the birth, death and diversification of 
topics. 
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Network systems 1 " 8 typically display a modular organization, reflecting the existence of special affinities 
among vertices in the same module, which may be a consequence of their having similar features or the 
same roles in the network. Such affinities are revealed by a considerably larger density of edges within 
modules than between modules. This property is called community structure or graph clustering 9 " 13 : detecting the 
modules (also called clusters or communities) may uncover similarity classes of vertices, the organization of the 
system and the function of its parts. 

The community structure of complex networks is still rather elusive. The definition of community is contro- 
versial, and should be adapted to the particular class of systems/problems one considers. Consequently it is not yet 
clear how scholars can test and validate community detection methods, although the issue has lately received 
some attention 14 " 18 . Also, in order to deliver possibly more reliable results, methods should ideally exploit all 
features of the system, like edge directedness and weight (for directed and weighted networks, respectively), and 
account for properties of the partitions, like hierarchy 19 ' 20 and community overlaps 21 ' 22 . Very few methods are 
capable to take all these factors into consideration 23,24 . Another important barrier is the computational complexity 
of the algorithms, which keep many of them from being applied to networks with millions of vertices or larger. 

In this paper we focus on another major problem affecting clustering techniques. Most of them, in fact, do not 
deliver a unique answer. The most typical scenario is when the seeked partition or individual clusters correspond 
to extrema of a cost function 25 " 27 , whose search can only be carried out with approximation techniques, with results 
depending on random seeds and on the choice of initial conditions. Allegedly deterministic methods may also run 
into similar difficulties. For instance, in divisive clustering methods 9 ' 28 the edges to be removed are the ones 
corresponding to the lowest/highest value of a variable, and there is a non-negligible chance of ties, especially in the 
final stages of the calculation, when many edges have been removed from the system. In such cases one usually 
picks at random from the set of edges with equal (extremal) values, introducing a dependence on random seeds. 

In the presence of several outputs of a given method, is there a partition more representative of the actual 
community structure of the system? If this were the case, one would need a criterion to sort out a specific partition 
and discard all others. A better option is combining the information of the different outputs into a new partition. 
Exploiting the information of different partitions is also very important in the detection of communities in 
dynamic systems 29 " 32 , a problem of growing importance, given the increasing availability of time- stamped 
network datasets 33 . Existing methods typically rely on the analysis of individual snapshots, while the history of 
the system should also play a role 32 . Therefore, combining partitions corresponding to different time windows is 
a promising approach. 

Consensus clustering 34 " 36 is a well known technique used in data analysis to solve this problem. Typically, the 
goal is searching for the so-called median (or consensus) partition, i.e. the partition that is most similar, on 
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average, to all the input partitions. The similarity can be measured in 
several ways, for instance with the Normalized Mutual Information 
(NMI) 37 . In its standard formulation it is a difficult combinatorial 
optimization problem. An alternative greedy strategy 34 , which we 
explore here, uses the consensus matrix, i.e. a matrix based on the 
cooccurrence of vertices in clusters of the input partitions. The con- 
sensus matrix is used as an input for the graph clustering technique 
adopted, leading to a new set of partitions, which generate a new 
consensus matrix, etc., until a unique partition is finally reached, 
which cannot be altered by further iterations. This procedure has 
proven to lead quickly to consistent and stable partitions in real 
networks 38 . 

We stress that our goal is not finding a better optimum for the 
objective function of a given method. Consensus partitions usually 
do not deliver improved optima. On the other hand, global quality 
functions, like modularity 39 , are known to have serious limits 40-42 , 
and their optimization is often unable to detect clusters in realistic 
settings, not even when the clusters are loosely connected to each 
other. In this respect, insisting in finding the absolute optimum of 
the measure would not be productive. However, if we buy the popular 
notion of communities as subgraphs with a high internal edge den- 
sity and a comparatively low external edge density, the task of any 
method would be easier if we managed to further increase the in- 
ternal edge density of the subgraphs, enhancing their cohesion, and 
to further decrease the edge density between the subgraphs, enhan- 
cing their separation. Ideally, if we could push this process to the 
extreme, we would end up with a set of disconnected cliques, which 
every method would be able to identify, despite its limitations. 
Consensus clustering induces this type of transformation (Fig. 1) 
and therefore it mitigates the deficiencies of clustering algorithms, 
leading to more efficient techniques. The situation in a sense recalls 
spectral clustering 43 , where by mapping the original network in a 
network of points in a Euclidean space, through the eigenvector 
components of a given matrix (typically the Laplacian), one ends 
up with a system which is easier to clusterize. 

In this paper we present the first systematic study of consensus 
clustering. We show that the consensus partition gets much closer to 
the actual community structure of the system than the partitions 
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Figure 1 | Effect of consensus clustering on community structure. 

Schematic illustration of consensus clustering on a graph with two visible 
clusters, whose vertices are indicated by the squares and circles on the (I) 
and (II) diagrams. The combination of the partitions (I), (II), (III) and 
(IV) yields the (weighted) consensus graph illustrated on the right (see 
Methods). The thickness of each edge is proportional to its weight. In the 
consensus graph the cluster structure of the original network is more 
visible: the two communities have become cliques, with "heavy" edges, 
whereas the connections between them are quite weak. Interestingly, this 
improvement has been achieved despite the presence of two inaccurate 
partitions in three clusters (III and IV). 



obtained from the direct application of the chosen clustering 
method. We will also see how to monitor the evolution of clusters 
in temporal networks, by deriving the consensus partition from sev- 
eral snapshots of the system. We demonstrate the power of this 
approach by studying the evolution of topics in the citation network 
of papers published by the American Physical Society (APS). 

Results 

Accuracy. In order to demonstrate the superior performance 
achievable by integrating consensus clustering in a given method, 
we tested the results on artificial benchmark graphs with built-in 
community structure. We chose the LFR benchmark graphs, which 
have become a standard in the evaluation of the performance of 
clustering algorithms 14 " 18 . The LFR benchmark is a generalization 
of the four-groups benchmark proposed by Girvan and Newman, 
which is a particular realization of the planted € -partition model by 
Condon and Karp 44 . LFR graphs are characterized by power law 
distributions of vertex degree and community size, features that 
frequently occur in real world networks. 

The clustering algorithms we used are listed below: 

• Fast greedy modularity optimization. It is a technique developed 
by Clauset et al. 45 , that performs a quick maximization of the 
modularity by Newman and Girvan 39 . The accuracy of the estim- 
ate for the modularity maximum is not very high, but the method 
has been frequently used because it has been one of the first 
techniques able to analyze large networks. We label it here as 
Clauset et al. 

• Modularity optimization via simulated annealing. Here the max- 
imization of modularity is carried out in a more exhaustive (and 
computationally expensive) way. Simulated annealing is a tra- 
ditional technique used in global optimization problems 46 . The 
first application to modularity has been devised by Guimera 
et al. 47 . In contrast to the standard design, we start at zero tem- 
perature. This is necessary because if the method is very stable 
there is no point in using the consensus approach: if the algorithm 
systematically finds the same clusters, the consensus matrix D 
would consist of m disconnected cliques and the successive clus- 
terization of D would yield the same clusters over and over. For 
the method we use the label SA. 

• Louvain method. The goal is still the optimization of modularity, 
by means of a hierarchical approach. First one partitions the 
original network in small communities, such to maximize mod- 
ularity with respect to local moves of the vertices. This first gen- 
eration clusters turn into supervertices of a (much) smaller 
weighted graph, where the procedure is iterated, and so on, until 
modularity reaches a maximum. It is a fast method, suitable to 
analyze very large graphs. However, like all methods based on 
modularity optimization, including the previous two, it is biased 
by the intrinsic limits of modularity maximization 40 " 42 . We refer 
to this method as to Louvain. 

• Label propagation method. This method 48 simulates the spreading 
of labels based on the simple rule that at each iteration a given 
vertex takes the most frequent label in its neighborhood. The 
starting configuration is chosen such that every vertex is given a 
different label and the procedure is iterated until convergence. 
This method has the problem of partitioning the network such 
that there are very big clusters, due to the possibility of a few labels 
to propagate over large portions of the graph. We considered 
asynchronous updates, i.e. we update the vertex memberships 
according to the latest memberships of the neighbors. We shall 
refer to this method as LPM. 

• Infomap. The idea behind this method is the same as in car- 
tography: dividing the network in areas, like counties/states in a 
map, and recycling the identifiers/names of vertices/towns 
among different areas. The goal is to minimize the description 
of an infinitely long random walk taking place on the network 23 . 
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When the graph has recognizable clusters, most of the time the 
walker will be trapped within a cluster. That way, the additional 
cost of introducing new labels to identify the clusters is compen- 
sated by the fact that such labels are seldom used to describe the 
process, as transitions between clusters are unfrequent, so the 
recycling of the binary identifiers for the vertices among different 
clusters leads to major savings in the description of the random 
walk. We shall refer to this method as Infomap. 
• OSLOM. The method relies on the concept of statistical signifi- 
cance of clusters. The idea here is that, since random graphs are 
not supposed to have clusters, the subgraphs of a network that are 
deemed to be communities should be very different from the 
subgraphs one observes in a random graph with similar features 
as the system at study. The statistical significance is then esti- 
mated through the probability of finding the observed clusters 
in a random network with identical expected degree sequence 24 . 
Clusters are identified by maximizing locally such probability. We 
shall refer to this method as OSLOM. 

All the above techniques can be applied to weighted networks, a 
necessary requisite for our implementation of consensus clustering 
(see Methods). 

In Fig. 2 we show the results of our tests. Each panel reports the 
value of the Normalized Mutual Information (NMI) between the 
planted partition of the benchmark and the one found by the algo- 
rithm as a function of the mixing parameter fi, which is a measure of 
the degree of fuzziness of the clusters. Low values of fi correspond to 
well -separated clusters, which are fairly easy to detect; by increasing 
pi communities get more mixed and clustering algorithms have more 
difficulties to distinguish them from each other. As a consequence, 
all curves display a decreasing trend. The NMI equals 1 if the two 
partitions to compare are identical, and approaches 0 if they are very 
different. In Fig 2a and 2b the benchmark graphs consist of 1000 
and 5000 vertices, respectively. Each point corresponds to an average 
over 100 different graph realizations. For every realization we have 
produced 150 partitions with the chosen algorithm. The curve 
"Original" shows the average of the NMI between each partition 
and the planted partition. The curve "Consensus" reports the NMI 
between the consensus and the planted partition, where the former 
has been derived from the 150 input partitions. We do not show the 
results for Infomap and OSLOM because their performance on the 
LFR benchmark graphs is very good already 16 ' 24 , so it could not be 



sensibly improved by means of consensus clustering (we have veri- 
fied that there still is a small improvement, though). The procedures 
to set the number of runs and the value of the threshold z for each 
method are detailed in the Supplementary Information (SI) (Figs. SI 
and S2). In all cases, consensus clustering leads to better partitions 
than those of the original method. The improvement is particularly 
impressive for the method by Clauset et al: the latter is known to 
have a poor performance on the LFR benchmark 16 , and yet in an 
intermediate range of values of the mixing parameter fi it is able to 
detect the right partition by composing the results of individual runs. 
For fi small the algorithm delivers rather stable results, so the con- 
sensus partition still differs significantly from the planted partition of 
the benchmark. In the Supplementary Information we give a math- 
ematical argument to show why consensus clustering is so effective 
on the LFR benchmark (Figs. S3 and S4). 

Stability. Another major advantage of consensus clustering is the fact 
that it leads to stable partitions 38 . Here we verify how stability varies 
with the number of input runs r. In Figs. 3 and 4 we present stability 
plots for two real world datasets: the neural network of C. elegans 49,50 
(453 vertices, 2 050 edges); the citation network of papers published 
in journals of the American Physical Society (APS) (445 443 vertices, 
4 505 730 directed edges). Each figure shows two curves: the average 
NMI between best partitions (circles); the average NMI between 
consensus partitions (squares). Both the best and the consensus 
partition are computed for r input runs, and the procedure is 
repeated for 20 sequences of r runs. So we end up having 20 
best partitions and 20 consensus partitions. The values reported 
are then averages over all possible pairs that one can have out of 
20 numbers. Each of the six panels corresponds to a specific 
clustering algorithm. To derive the consensus partitions we used 
the same values of the threshold parameter z as in the tests of 
Fig. 2a (for Infomap and OSLOM z = 0.5). 

As "best" partition for Louvain, SA and Clauset et al. we take the 
one with largest modularity. This sounds like the most natural 
choice, since such methods aim at maximizing modularity. For the 
LPM there is no way to determine which partition could be consid- 
ered the best, so we took the one with maximal modularity as well. 
On the other hand, both Infomap and OSLOM have the option to 
select the best partition out of a set of r runs. 

In Fig. 3 we show the stability plot for C. elegans. For all methods 
the consensus partition turns out to be more stable than the best 
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Figure 2 | Consensus clustering on the LFR benchmark. The dots indicate the performance of the original method, the squares that obtained with 
consensus clustering. The parameters of the LFR benchmark graphs are: average degree (k) = 20, maximum degree k max = 50, minimum community size 
c min = 10, maximum community size c max = 50, the degree exponent is x x = 2, the community size exponent is t 2 = 3. Each panel correspond to a 
clustering algorithm, indicated by the label. The two sets of plots correspond to networks with 1000 (a) and 5000 (b) vertices. 
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Figure 3 | Stability plot for the neural network of C. elegans. The network has 453 vertices and 2050 edges. 



partition. The only exception is the method by Clauset et al., but the 
two curves are rather close to each other. We remark that increasing 
the number of input runs does not necessarily imply more stable 
partitions. In the cases of LPM and OSLOM, for instance, the best 
partitions of the method get more unstable for r~ 10. On the other 
hand, the stability of the consensus partition is monotonically 
increasing for all six algorithms. 

In Fig. 4 we see the corresponding plot for the APS dataset. The 
analysis of the full dataset is too computationally expensive, so we 
focused on a subset, that of papers published in 1960, along with the 
papers cited by them. The resulting network has 5 696 vertices and 
8 634 edges. Again, we see that the stability of the consensus partition 
grows monotonically with the number of input runs r, and it remains 
higher than that of the best partition. 



In the Supplementary Information we show that the consensus 
partition is not only more stable, but it also has higher fidelity than 
the individual input partitions it combines (Figs. S5 and S6). 

Dynamic communities. Consensus clustering is a powerful tool to 
explore the dynamics of community structure as well. Here we show 
that it is able to monitor the history of the citation network of the 
APS, and to follow birth, growth, fragmentation, decay and death of 
scientific topics. The procedure to derive the consensus partitions out 
of time snapshots of a network is described in the Methods. 

The evolution of the APS dataset is shown in Fig. 5a. The system 
is too large to be meaningfully displayed in a single figure, so we 
focused on the evolution of communities of papers in Statistical 
Physics. For that, we selected only the clusters whose papers include 
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Figure 4 | Stability plot for the citation network of papers published in journals of the American Physical Society (APS). The original dataset is too large 
to get results in a reasonable time, so the plot refers to the subset containing all papers published in 1960 and the ones cited by them. 
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Criticality, Fractal, Ising, Network and Renormalization among the 
15 most frequent words in their titles. Each vertical bar corresponds 
to a time window of 5 years (see Methods), its length to the size of 
the system. The time ranges from 1945 until 2008. The evolution 
is characterized by alternating phases of expansion and contraction, 
although in the long term there is a growing tendency in the number 
of papers. This is due to the fact that the keywords we selected were 
fashionable in different historical phases of the development of 
Statistical Physics, so some of them became obsolete after some time 
(i.e., there are less papers with those keywords), while at the same 
time others become more fashionable. Communities are identified by 
the colors. Pairs of matching clusters in consecutive times are marked 
by the same color. Clusters of consecutive time windows sharing 
papers are joined by links, whose width is proportional to the number 
of common papers. We mark the clusters corresponding to famous 
topics in Statistical Physics, indicating the most frequent words 
appearing in the titles of the papers of each cluster. One can spot 
the emergence of new fields, like Self-Organized Criticality, Spin 
Glasses and Complex Networks. 

In Fig. 5b we consider only papers with the words Network or 
Networks among the 15 most frequent words in their titles. Here 
we can observe the genesis of the fields Neural Networks and 
Complex Networks. In order to have clearer pictures, in Fig. 5a we 
only plotted clusters that have at least 50 papers, while in Fig. 5b the 
threshold is 10 papers. 

For a quantitative assessment of the birth, evolution and death of 
topics, we keep track of each cluster matching it with the most similar 
module in the following time frame (see Methods). This allows us to 
compute one sequence for each cluster, which reports its size for all 
the years when the community was present. In Fig. 6 we computed 
the statistics of these sequences, centering them on the year when the 
cluster reached its peak (reference year 0). To obtain smooth pat- 
terns, clusters are aggregated in bins according to their peak mag- 
nitude. Fig. 6 shows the average cluster size for each bin as a function 
of the years from the peak. We computed the curves using Infomap 
(left) and OSLOM (right). Around the peak, the cluster sizes are 
highly heterogeneous, with some important topics reaching almost 
1000 papers at the peak (for Infomap). The rise and decline of topics 
take place around 10 years before and after the peak, with a remark- 
ably symmetric pattern with respect to the maximum. 



(a) 




Discussion 

Consensus clustering is an invaluable tool to cope with the stochastic 
fluctuations in the results of clustering techniques. We have seen that 
the integration of consensus clustering with popular existing tech- 
niques leads to more accurate partitions than the ones delivered by 
the methods alone, in artificial graphs with planted community 
structure. This holds even for methods whose direct application gives 
poor results on the same graphs. In this way it is possible to fully 
exploit the power of each method and the diversity of the partitions, 
rather than being a problem, becomes a factor of performance 
enhancement. 

Finding a consensus between different partitions also offers a nat- 
ural solution to the problem of detecting communities in dynamic 
networks. Here one combines partitions corresponding to snapshots 
of the system, in overlapping time windows. Results depend on the 
choice of the amplitude of the time windows and on the number of 
snapshots combined in the same consensus partition. The choice of 
these parameters may be suggested by the specific system at study. It 
is usually possible to identify a meaningful time scale for the evolu- 
tion of the system. In those cases both the size of the time windows 
and the number of snapshots to combine can be selected accordingly. 
As a safe guideline one should avoid merging partitions referring to a 
time range which is much broader than the natural time scale of the 
network. A good policy is to explore various possibilities and see if 
results are robust within ample ranges of reasonable values for the 
parameters. Additional complications arise from the fact that the 
evolution of the system may not be linear in time, so that it cannot 
be followed in terms of standard time units. In citation networks, like 
the one we studied, it is known that the number of published papers 
has been increasing exponentially in time. Therefore, a fixed time 
window would cover many more events (i.e. published papers and 
mutual citations) if it refers to a recent period than to some decades 
ago. In those cases, a natural choice could be to consider snapshots 
covering time windows of decreasing size. 

Methods 

The consensus matrix. Let us suppose that we wish to combine n P partitions found 
by a clustering algorithm on a network with n vertices. The consensus matrix D is 
annXn matrix, whose entry Dy indicates the number of partitions in which vertices i 
and j of the network were assigned to the same cluster, divided by the number of 



(b) 




Figure 5 | Time evolution of clusters in the APS citation network. In (a) we selected all the clusters that have at least one of the keywords Criticality, 
Fractal, Ising, Network and Renormalization among the top 15 most frequent words appearing in the title of the papers, while in (b) we just filtered the 
keyword Network(s). Both diagrams were obtained using Infomap on snapshots spanning each a window of 5 years, except at the right end of each 
diagram: since there is no data after 2008, the last windows must have 2008 as upper limit, so their size shrinks (2004 - 2008, 2005 - 2008, 2006 - 2008, 
2007 - 2008). Consensus is computed by combining pairs of consecutive snapshots (see Methods). A color uniquely identifies a module, while the width 
of the links between clusters is proportional to the number of papers they have in common. In (b) we observe the rapid growth of the field Complex 
Networks, which eventually splits in a number of smaller subtopics, like Community Structure, Epidemic Spreading, Robustness, etc.. 
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Years from the peak 

Figure 6 | Evolution of average size of clusters. The time ranges of the evolution of the communities have been shifted such that the year when a cluster 
reaches its maximum is 0. The two panels show the results obtained with Infomap (left) and OSLOM (right). The data are aggregated in four bins, 
according to the maximum size reached by the cluster. The phases of growth and decay of fields appear rather symmetric. 



partitions n P . The matrix D is usually much denser than the adjacency matrix A of 
the original network, because in the consensus matrix there is an edge between any 
two vertices which have cooccurred in the same cluster at least once. On the other 
hand, the weights are large only for those vertices which are most frequently co- 
clustered, whereas low weights indicate that the vertices are probably at the boundary 
between different (real) clusters, so their classification in the same cluster is unlikely 
and essentially due to noise. We wish to maintain the large weights and to drop 
the low ones, therefore a filtering procedure is in order. Among the other things, in the 
absence of filtering the consensus matrix would quickly grow into a very dense matrix, 
which would make the application of any clustering algorithm computationally 
expensive. 

We discard all entries of D below a threshold t. We stress that there might be 
some noisy vertices whose edges could all be below the threshold, and they would 
be not connected anymore. When this happens, we just connect them to their 
neighbors with highest weights, to keep the graph connected all along the procedure. 

Next we apply the same clustering algorithm to D and produce another set of 
partitions, which is then used to construct a new consensus matrix D', as described 
above. The procedure is iterated until the consensus matrix turns into a block diag- 
onal matrix jy inal , whose weights equal 1 for vertices in the same block and 0 for 
vertices in different blocks. The matrix jy mal delivers the community structure of the 
original network. In our calculations typically one iteration is sufficient to lead to 
stable results. We remark that in order to use the same clustering method all along, the 
latter has to be able to detect clusters in weighted networks, since the consensus 
matrix is weighted. This is a necessary constraint on the choice of the methods for 
which one could use the procedure proposed here. However, it is not a severe lim- 
itation, as most clustering algorithms in the literature can handle weighted networks 
or can be trivially extended to deal with them. 

We close by summarizing the procedure, step by step. The starting point is a 
network Q with n vertices and a clustering algorithm A. 

1. Apply A on Q Hp times, so to yield n P partitions. 

2. Compute the consensus matrix D, where is the number of partitions in 
which vertices i and j of Q are assigned to the same cluster, divided by n P . 

3. All entries of D below a chosen threshold t are set to zero. 

4. Apply A on D n P times, so to yield n P partitions. 

5. If the partitions are all equal, stop (the consensus matrix would be block- 
diagonal). Otherwise go back to 2. 

Consensus for dynamic clusters. In the case of temporal networks, the dynamics 
of the system is represented as a succession of snapshots, corresponding to 
overlapping time windows. Let us suppose to have m windows of size At for a time 
range going from t 0 to t m . We separate them as [t 0 , t 0 + At], [t 0 + 1, t 0 + At + 1], 
[t 0 + 2, t 0 + At + 2], . . ., [t m — At, t m ] . Each time window is shifted by one time unit to 
the right with respect to the previous one. The idea is to derive the consensus partition 
from subsets of r consecutive snapshots, with r suitably chosen. One starts by 
combining the first r snapshots, then those from 2 to r + 1, and so on until the interval 
spanned by the last r snapshots. In our calculations for the APS citation network we 
took At = 5 (years), r = 2. 

There are two sources of fluctuations: 1) the ones coming from the different 
partitions delivered by the chosen clustering technique for a given snapshot; 2) the 
ones coming from the fact that the structure of the network is changing in time. The 
entries of the consensus matrix Dy are obtained by computing the number of times 
vertices i and j are clustered together, and dividing it by the number of partitions 
corresponding to snapshots including both vertices. This looks like a more sensible 
choice with respect to the one we had adopted in the static case (when we took the 
total number of partitions used as input for the consensus matrix), as in the 



evolution of a temporal network new vertices may join the system and old ones may 
disappear. 

Once the consensus partitions for each time step have been derived, there is the 
problem of relating clusters at different times. We need a quantitative criterion to 
establish whether a cluster C t +\ at time t + 1 is the evolution of a cluster C t at time t. 
The correspondence is not trivial: a cluster may fragment, and thus there would be 
many "children" clusters at time t + 1 for the same cluster at time t. In order to assign 
to each cluster C t of the consensus partition at time t one and only one cluster of 
the consensus partition V t+ 1 at time t + 1 we compute the Jaccard index 51 between C t 
and every cluster of V t + 1 , and pick the one which yields the largest value. The Jaccard 
index J(A, B) between two sets A and B equals 

KA,B)J^. (1) 
n J \A{JB\ 1 ; 

In our case, since the snapshots generating the partitions refer to different moments of 
the life of the system and may not contain the same elements, the Jaccard index is 
computed by excluding from either cluster the vertices which are not present in both 
partitions. The same procedure is followed to assign to each cluster C t +i of the 
consensus partition at time t + 1 one and only one cluster of the consensus partition 
V t at time t. In general, if cluster A at time t is the best match of cluster B at time t + 1, 
the latter may not be the best match of A. If it is, then we use the same color for 
both clusters. Otherwise there is a discontinuity in the evolution of A, which stops at t, 
and its best match at time t + 1 will be considered as a newly born cluster. 
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